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THE ASSESSMENT OF POTENTIAL MUSICAL 
ABILITY IN SECONDARY SCHOOL 
CHILDREN 


R. W. T. WHITTINGTON 


(Former Music Master, Otago Boys’ High School, Dunedin, New Zealand 


In order to ascertain the degree of musical ability in all first-year 
boys it was decided to carry out tests using material derived from 
similar investigations by Dr. C. H. Wing of London (16). By 
applying this material to two carefully selected groups it was 
hoped, inter alia, to evolve a shorter battery of tests which could 
be used to ascertain musical ability. 

While seeking a general approach to the question the author 
consulted past work in the field of musical ability, and on reading 
James L. Mursell’s The Psychology of Music, came across a state- 
ment that formed the basic approach to his problem: “We must 
try,’’ wrote Mursell, “our developed tests upon individuals known 
to be conspicuously musical and those known to be conspicuously 
nonmusical to try to discover where the most crucial and significant 
performances are located. Strange to say,’’ he added, “‘this obvious 
procedure has only been adopted by Stumf and by Ravecz—and 
in each case only a single individual subject was involved” (10). 
Dr. Wing used a battery of seven tests which included tests of 
perceptual efficiency and tests of musical preference. In some of 
his tests Wing’s approach was that of the Gestalt psychologists—a 
school of psychology which emphasized the perception of patterns, 
shapes, and tunes as wholes. The primary object of Wing’s research 
was “to examine the hypothesis of a general factor for musical 
ability, and to explore the influence of more specialized factors 
such as might obscure this general factor or interfere with its 
measurement.’’ Wing found that there was no great difference as 
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regards the efficiency of the individual tests, but found the phrasing 
test which had a saturation coefficient of nearly 0.80 the best. 

In order to find the most significant differences in performance 
it was proposed to follow up Mursell’s premise and to try Wing’s 
tests upon individuals known to be conspicuously musical and 
those known to be conspicuously nonmusical. From this it was 
hoped to reduce Wing’s battery of seven tests to a more economical 
number—if possible to two or three. Most important, then, was 
the selection of the two groups. Mursell’s use of the phrase ‘‘con- 
spicuously musical’? made it imperative to select only those pupils 
who had proved themselves both in the theory and practice of 
music. Ages in this group ranged from thirteen years eleven months 
to eighteen years six months, the mean age being sixteen years 
three months. My idea was to select pupils from the sixteen to 
eighteen age group, for by doing so I was certain of obtaining 
only those who had advanced music certificates. However, four 
pupils under fifteen years of age were included in the group. I 
knew two of these pupils to be conspicuously musical, and informa- 
tion concerning the other two was given by their respective college 
music directors. Two pupils possessed Associate Diplomas of 
recognized music colleges, while two possessed the Licentiate. 
The remainder of the group had reached the higher grades and 
had gained either honours or merit in their theory and practical 
examinations. In all but four cases there was musical activity 
in the home. All twenty-four of the musical group were active 
participants in the musical life of their respective schools—orches- 
tras, bands, choral clubs, and so on. Five of the group had shown 
promise in the field of composition and had actually had their 
works performed in public. 

‘The second group was more difficult to select. To follow Mursell’s 
premise faithfully this group had to be conspicuously nonmusical. 
Selection was based mainly on: first, complete dislike for music, 
and second, lack of musicality in parents, brothers, and sisters. 
For a final selection I used Wing’s questionnaire. Subjects were 
asked to place a tick against the word that most nearly described 
their general attitude towards music: (a) very interested; (b) 
interested; (c) indifferent; (d) dislike. 

They could add a plus or minus sign if they desired finer shades 
than those given. Secondly, the following was given to elicit infor- 
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mation about their own musical activities and those of their par- 
ents, brothers, and sisters: Does anyone at home play an instru- 
ment? If so, state who they are, what they play, and roughly how 
much, e.g., “father, oboe, frequently.”” They were asked to use 
terms ‘‘frequently,” “‘occasionally,”’ or “seldom” to describe the 
amount of playing activity. For my final nonmusical group I 
selected those pupils who: (a) placed a tick against dislike; (b) re- 
corded a “nil” answer to musical activity at home; (c) did not 
play an instrument and were not interested in learning to do so; 
(d) were not keen about school musical activities such as school 
singing (this was an individual question). Ages in this group ranged 
from fifteen years to seventeen years eleven months, the mean 
age being sixteen years six months. 

Besides these two main groups | included a group of three pupils 
who, having had some years of music instruction, had given up 
because of lack of progress. I thought that, although a very small 
group, they might provide some further evidence to aid final 
analysis. For the experiment I would have liked a very much 
larger group, but the fact that testing had to be done out of school 
hours made it most difficult to collect a group enthusiastic enough 
to take the tests. As far as the musical group was concerned, 
directors of music of the secondary schools concerned sent only 
their best pupils. I considered this to be an excellent group. 

To ascertain ability to appreciate certain aspects of musical 
material, the complete battery of Wing’s tests was administered. 
This included chord analysis, pitch change, memory tests, rhythmic 
accent, harmony, intensity, and phrasing. While making prelimi- 
nary arrangements for the investigation, approach had been made 
to various music teachers and others interested in music and 
their opinions sought on the nature of musical ability. Quite a 
number were emphatic about the possession of good intelligence. 
However, one particular teacher of pianoforte made this reply, 
“Tt is all a question of head and hand.’’ Whether manual dexterity 
would prove to be a factor or not prompted me to include the 
Minnesota Rate of Manipulation Test. The first two individual 
tests were used—placing, ard turning and placing. The first test 
involved the use of one hand, the second, two hands. To test 
intelligence Raven’s Progressive Matrices, 1947, was used. 
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ANALYSIS OF THE DATA 


Although sixty pupils were tested two groups each of twenty- 
four pupils were selected for final diagnosis. The first task was to 
ascertain significant differences between the musical scores of the 
two groups. The critical ratio proved to be 10.5. Similarly, critical 
ratios were obtained between the two groups on all musical tests: 


Test 1 7.02 
Test 2 8.78 
Test 3 6.50 
Test 4 4.27 
Test 5 11.25 
Test 6 4.64 
Test 7 3.42 


At the 0.01 level all the ratios were highly significant. The two 
groups were then regarded as distinct as far as their performances 
in the musical tests were concerned. The critical ratio between 
the percentile rankings on the Raven Intelligence, however, was 
0.30. There was thus no outstanding difference between the two 
groups at the 0.01 level, and for the purpose of the investigation 
the two groups were regarded as being homogeneous as far as the 
matrices intelligence test was concerned. 

The first part of the investigation was to follow up the Mursell 
premise. It was essential, of course, to deal separately with the 
two groups. By using factor analysis methods it was hoped that 
some factor or factors would be isolated which would indicate 
where the significant differences in performance lay. The next 
step, then, was to correlate the marks in the music tests and to 
factorize the correlations (Table I). 

Only one factor was isolated from this group, a factor that 
accounted for some twenty-eight per cent of the performance. 
The rather high correlation of the Raven percentile suggested 
that intelligence as measured by this test was operating in some 
form or other. One was rather diffident to speculate just what 
the factor might be. The first factor saturations on tests 2 and 7 
are high. Test 2 involved sensitivity to pitch relationships, and 
test 7 was one which required judgment of the more appropriate 
phrasing of a melody. Both tests involved a certain degree of ap- 
preciation of relationships, and one might say that test 7 involved 
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TABLE I—CoRRELATION Matrix AND First Factor SOLUTION 
(Nonmusic group) 








Test 1 2 3 4 5 6 7 Raven = 
1 0.31 0.28 0.10 0.31 0.24 0.10 0.27 0.21 1.82 
2 0.28 0.63 0.34 0.51 0.02 0.00 0.50 0.63 2.91 
3 0.10 0.34 0.46 0.13 0.14 0.46 0.27 0.32 2.22 
4 0.31 0.51 0.13 0.51 0.08 0.15 0.25 0.20 2.14 
5 0.24 0.02 0.14 0.08 0.28 0.00 0.28 0.17 1.21 
6 0.10 0.00 0.46 0.15 0.00 0.46 0.22 0.00 1.29 
7 0.27 0.50 0.27 0.25 0.28 0.22 0.50 0.40 2.69 

Raven 0.21 0.63 0.32 0.20 0.17 0.00 0.40 0.63 2.56 
Zz 1.82 2.91 2.22 2.14 1.21 1.39 2.69 2.56 16.94 

First factor load- 
ings 0.44 0.71 0.54 0.52 0.29 0.34 0.65 0.62 





The mean communality of the first factor of this group was 0.282. 


not only the ability to educe relations but also correlates.. This is, 
of course, the idea underlying the matrices test. Test 3 had a 
saturation of 0.54. The test involved detection of a charge of 
note in a short melodic phrase—a test which demanded retentivity. 

It is not often an easy matter to listen first to a melodic phrase 
of a given number of notes, and then when the phrase is repeated 
to say which note is changed. However, the mere fact that test 2 
involved a relationship of two notes—one higher or lower than 
the given note—and test 7 differentiated between staccato and 
legato playing may simply mean that both tests were measuring 
sensitivity. Whatever the factor operating through the perform- 
ances of the nonmusical group, it would appear to be a significant 
element in the Raven test performance. 

There appeared to be a factor which was responsible for some 
forty-four per cent of the performance in the musical group (Table 
II). The Raven loading was not as high in this group as it was 
in the case of the nonmusical group. Something was therefore 
operating through the performances of the musical group that 
was over and above intelligence as measured by the matrices 
test. Test 4 (judging the better rhythmic accent) and test 5 
(judging the more appropriate of the two harmonizations) were 
the most outstanding performances. As all the first factor loadings 
were high, one may conclude a common factor running through 
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TasBLeE II—CorrELATION Matrix AND First Factor SoLutTIon 
(Musical group) 








Test 1 2 3 4 5 6 7 Raven = 
1 0.66 0.13 0.66 0.57 0.39 0.39 0.46 0.36 3.62 
2 0.13 0.54 0.17 0.54 0.47 0.37 0.44 0.18 2.84 
3 0.66 0.17 0.66 0.64 0.53 0.31 0.16 0.42 3.55 
4 0.57 0.54 0.64 0.65 0.65 0.46 0.40 0.40 4.31 
5 0.39 0.47 0.53 0.46 0.65 0.37 0.57 0.47 4.10 
6 0.39 0.37 0.31 0.40 0.37 0.52 0.41 0.52 3.35 
7 0.46 0.44 0.16 0.40 0.37 0.42 0.57 0.20 3.21 

Raven 0.36 0.18 0.42 0.40 0.47 0.52 0.20 0.52 3.07 
z= 3.62 2.84 3.55 4.31 4.10 3.35 3.21 3.07 28.05 

First factor load- 
ings 0.68 0.54 0.67 0.81 0.77 0.63 0.60 0.58 





The mean communality of the first factor of this group was 0.445. 


all the tests. A residual matrix was obtained and McNemar’s 
criterion applied.’ 

What, then, was this factor affecting the performances in all 
the tests? I hesitated to call it ‘‘musical ability.”’ The group was a 
homogeneous one, and the critical ratios between the two groups 
was so highly significant that I could but regard the factor as 
one of “musical experience’—experience plus an _ intellectual 
element. This intellectual element I would not confuse with Spear- 
man’s ‘‘g.”’ The experiential element would certainly account for 
the high saturation in test 5 (harmony), where ears trained in 
aural perception would easily detect those melodic lines in which 
chord progressions were obviously incorrect. The first factor 
loading in test 4 was 0.81. In this test Wing had included classical 
extracts which were obviously known by members of the musical 
group, for one had only to study their faces during the test perform- 
ance to realise that tunes were familiar. 





1 McNemar’s criterion: McNemar isolated factors until G; reached or fell 


1 
below /N 
Where G, = G, + (1 — M;?) 
Where G, = SD of residuals after s factors 
In the case of the residual Matrix for the musical group G; = .19 which fell 
below 0.20, i.e. (1/+/24). Thus only one factor could be isolated from 
the performances of the musical group. 
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And so, to attempt an answer to Mursell’s premise I could 
only suggest that the differences in performance lay in the fact 
that, in the case of the musical group, “experience” played a 
most important part. 

It is worthy of note here that Wing obtained his three factors 
from an analysis of the results of his tests on a group of forty-three 
boys aged fourteen to sixteen. He chose this group because it 
was the largest one to which he had given the tests twice. His 
group was not, therefore, a homogeneous one. The mere fact 
that my nonmusical group lacked musical experience had, in all 
probability, accounted for low test scores. Worthy of note also 
was the fact that those subjects who took the tests but who were: 
rejected because they had learned an instrument, or because of 
some form of musical expression in the home, scored a little higher 
than members of the nonmusical group. The critical ratios of the 
three girls who had had some three to four years of musical in- 
struction were also significant and marked them off as a group 
whose performances in the music tests were much better than 
those of the nonmusical group. 

The second part of the inves\:gation involved the selection of a 
more economical battery of tesss. Wing himself had stated that 
he did not object to a smaller number of tests. But to select such 
predictive tests from the original battery of seven it appeared 
that certain criteria should be satisfied. Tests should yield signifi- 
cant correlations with musical age in both the musical and the 
nonmusical groups; as the factor accounting for the best inter- 
correlations in the musical group appeared to involve an appre- 
ciative experiential element, the predictive tests should have ~ 
comparatively low loadings in that factor; the intercorrelations 
among the predictive tests should be as low as possible. 

Tests 2, 3, and 7 appeared to satisfy these criteria very well: 
they yielded signiticant and relatively constant correlations with 
musical age in both groups and their intercorrelations were low. 
When regression weights, determined by the Aitken-Thomson 
method of pivotal condensation, were applied, these three tests 
yielded most significant multiple correlations. Multiple correlation 
with the musical group was 0.92 and with the nonmusical group 
0.90. Tests yielding very high correlations only in the musical 
group (tests 3, 4, and 5) when similarly weightea failed to give 
multiple correlations of any greater magnitude than the coefficients 
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yielded by those tests in the two groups. This suggested that 
those particular tests were highly influenced by the “experiential 
factor” isolated in the musical group. It appeared, then, that as 
tests for selection purposes, tests 2, 3, and 4 were greatly to be 
preferred and appeared to be promising predictive instruments. 

The manual dexterity tests failed to give any significant results. 
In the musical group the correlation of musical age and manual 
test 1 (placing) was 0.17, and with manual test 2 (turning and 
placing), 0.10. The correlations in the nonmusical group were 
lower still: 0.10, manual test 1, and —0.18 with manual test 2. 
No attempt was made to correlate the Raven percentiles with 
any of the manual dexterity tests. I had hoped that there would 
have been some significant correlation between musical age and 
manual test 2 which involved the use of both hands. Observance, 
however, indicated that facility at the piano in no way helped to 
turn and place the blocks quickly—it appeared to be even a hin- 
drance. A pianist’s hands and fingers are trained to a certain 
pattern, and I found that most of the good pianists of the musical 
group fumbled and dropped blocks a good deal. The only one who 
performed really well was a Chinese lad in the nonmusical group. 
As far as manual test 1 was concerned, I thought that the good 
violin players in the musical group might have performed well as 
this test called for dexterity in the bow hand. Such, however, was 
again not the case. It appeared, then, that manual dexterity could 
be ruled out as a factor to be considered as far as musical perform- 
ance as measured by the Wing series of test was concerned. 


SUMMARY 


To assess musical ability and appreciation in the light of Mur- 
sell’s premise and to select an economical battery of Wing’s musical 
tests the following were administered: a) The Wing tests of musical 
intelligence, b) The 1947 Raven progressive matrices intelligence 
test (Australian Council of Educational Research), c) The Min- 
nesota tests of manual dexterity (tests 1 and 2). 

Tests were given to two carefully selected groups—musical, 
and nonmusical. 

Factor analysis on the lines described by Cyril Burt and using 
the Thurstone centroid method of factor analysis was applied. The 
results seemed to indicate that the musical group was superior to 
the nonmusical group because of musical experience, an experience 
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which accounted for some forty-four per cent of the performance. 
In the nonmusical group one was isolated which possibly utilized 
intelligence in a finer form to appreciate a “gestalt” quality. 

Using the Aitken-Thomson method of pivotal condensation 
three tests yielded significant multiple correlations. These were 
tests 2, 3, and 7. 

The correlation of performances in the musical tests with the 
matrices intelligence test was significant but not high. 

Tests of manual dexterity were not to be regarded as playing 
any part in the diagnosis of musical ability as far as the Wing 
tests were concerned. 

Further developments are possible using tests 2, 3 and 7 on 
untrained groups who wish to become musicians, with subsequent 
assessment of the predictive value of these tests when reliable 
measures of musical progress are available. 
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THE EFFECT OF INDUCED TENSION DURING 
TRAINING ON VISUAL FORM RECOGNITION: ? 


SYLVIA R. MAYER* 


Physical Research Laboratories, Boston University 


Visual form recognition plays a cental réle in many educational 
experiences ranging from the child’s early encounter with alpha- 
betical symbols to the adult’s identification of biological specimens 
or aircraft. Visual form learning, which underlies this important 
skill, is thereby an exceptionally fruitful area for research. Numer- 
ous facturs in visual learning affect the efficiency of recognition. 
This study, however, deals only with the effect of induced muscular 
tension during learning upon subsequent recognition. 

Many empirical analyses and speculative accounts appear to 
support the proposition that induced muscular tension influences 
learning and that an optimal amount is facilitative. From this 
proposition two generalizations, which provide the rationale for 
this study, have been extended to the classroom learning situation. 
The first states that learning is facilitated by the arousal of a 
hypothetical condition of “attention” or tension which is both 
instigated and revealed by observable muscular activity, as for 
example in the maintenance of erect posture. Support for this 
view has been drawn from many laboratory studies (e.g., 1, 3, 4, 8) 
which have systematically induced tension by procedures such as 
weight lifting or dynamometer performance. Some of these ex- 
periments demonstrate facilitation, but many do not. These 
equivocai results have recently been partially clarified by a verbal 
learning study (2) which demonstrates that induced tension facili- 
tates performance on a verbal task but has no effect on learning. 
This finding, however, cannot be generalized to the learning of 





1 This work, sponsored by the United States Air Force (Aerial Recon- 
naissance Laboratory, Wright Air Development Center), was performed 
by Boston University under Contract No. AF 33-038-15615. 

? Portions of this paper were presented at the 1955 meetings of the Amer- 
ican Psychological Association at San Francisco. 

* This article reports a part of a dissertation presented to the Graduate 
School of Boston University in partial fulfillment of the requirements for 
the degree of Doctor of Philosophy. 
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nonmeaningful visual forms without further qualification. The 
present study aims to supply the latter. 

The second generalization, which is concerned specifically with 
visual form learning, is an expansion of the first to include this 
broader notion: the pattern of the tension-inducing activity is 
important. This notion has been emphasized especially for teaching 
alphabetical symbols and numerals (6, 7) and biological and cul- 
tural forms (6, 9) by means of the traditional activity patterns of 
drawing and tracing. The assumption appears to be that if § 
parallels visual activity with isomorphic manual! activity, he 
responds to more form detail than if he engages in visual activity 
alone. A systematic investigation of this view seems to be needed 
since the predicted facilitation attributed to these special patterns 
of activity could as well be the result of a generalized increase in 
tension level induced by the motor activity per se, or the greater 
amount of training time spent when drawing or tracing are re- 
quired. The latter possibility is explored in this study. 

This experiment is designed to examine the above two generali- 
zations in the context of visual form learning by comparing the 
effect on recognition of several conditions of training tension. One 
set of tension conditions is induced by activity which is unrelated 
in pattern to the visual forms being learned; this is represented by 
dynamometer performance at several pressure levels. The second 
set of conditions is induced by activity which is directly related 
in pattern to the training forms; this is represented by tracing and 
drawing. 


METHOD 


Subjects. Twenty-eight men, ranging in age from twenty-two to 
forty years, served as Ss. Each had 20/20 vision and habitually 
used his right hand for tasks such as writing and drawing. The 
group was made up of Air Force officers with various specialties, 
and technical and research personnel of Boston University Physical 
Research Laboratories. 

Apparatus. The apparatus consisted of four main components: 
(a) an electronic, dual-controllec tachistoscope for the presentation 
of visual forms; (b) a Smedley hand dynamometer in circuit with 
a tachistoscope exposure switch and an auditory signal; (c) graphic 
materials: 4 x 4 in. white cards for drawing, tachistoscope-aperture 
covers of plexiglas for tracing, and black grease crayons; and (d) 














ue ’ Vv" ae De nal 


FS Ee wa feOoSUNrUlUl ee 


fe ee CC 





Effect of Induced Tension 13 








Ui 
MY 


Figure 1. An example of a training form (left) and recognition test 
(right). 
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TaBLeE I 
Presentation Order of Training Conditions 
Subjects 

1 2 3 4 5 6 7 
1, 815,22] % 0 4 D l¢ T 34 
2, 9, 16, 23 0 4 D 16 3 34 lg 
3,10,17,24| % D 34 T \ v, 0 
4,11, 18,25} D 34 T i \ 0 \ 
5, 12, 19, 26 34 3 lg 4 0 le D 
6, 13,20,27| T l¢ 4 0 34 D \% 
7, 14, 21, 28 4 6 0 3% D yy ry 




















seven ten-item, equated sets of training forms and recognition 
tests. Each test card contained one form identical to the training 
form and three similar ones. An example of training forms and 
tests is shown in Figure 1. 

Design. The design used required that each S learn one set of 
forms in each of the seven conditions: 0, 4, 4, % and 34 maxi- 
mum dynamometer tension; tracing (T); and drawing (D). The 
order of presentation of conditions was as in Table I. 

Procedure. The seven experimental sessions for each § took 
place within five days, not more than two a day at a minimum 
interval of three hours. At each session S performed on one set of 
forms. In all conditions e:.ch of the ten items in a training set was 
presented for fifteen seconds and spaced at twenty seconds. A 
ten-minute rest was interpolated between training and testing. 
The ten items in a recognition test were presented for five seconds 
each and spaced by ten seconds. S was required to select from 
four possibilities that form which he had memorized during train- 
ing. The number of correct recognitions served as the primary 
response measure. 

At the beginning of every experimental session S’s maximum 
dynamometer grip with his left hand was recorded at the end of 
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fifteen seconds of pressure. E then read instructions for the activity 
of the condition and emphasized the need for careful examination 
of each form in anticipation of the subsequent recognition test. 
Three practice forms were presented by the method for the session 
followed immediately by the recognition test. Correction, addi- 
tional instructions and practice were provided so that S could 
achieve perfect performance on the three practice forms. 

A three-minute rest was allowed between practice and training. 
For the , 4, % and 3% tension conditions the maximum dynamo- 
meter reading of the session was appropriately fractionated and 
the adjustable tachistoscope switch was positioned on the dy- 
namometer dial at the resulting value. Each form was exposed in 
the tachistoscope when S exerted the required pressure on the 
dynamometer handle upon a signal from E. If S overshot or under- 
shot the range of +three kilograms from the required point, the 
buzzer sounded until S corrected his grip, but the visual form 
remained. 

In the drawing condition S copied each form onto a card posi- 
tioned beside the tachistoscope aperture; in the tracing condition 
he traced each form onto a plexiglas slide placed over the tachisto- 
scope aperture; in the zero-induced tension condition he observed 
each form, his two arms resting on the tachistoscope stand. Expo- 
sures were controlled by E in the latter three conditions. To 
prevent S from viewing his completed drawing or tracing, a small 
shutter on the headrest was released at the end of each exposure. 
During the twenty-second interval E replaced the soiled cards or 
plexiglas slides with clean ones. The shutter was used in all con- 
ditions to maintain experimental uniformity. 


RESULTS 


Table II presents the mean number of correct responses and 
standard deviations (SDs) in the seven training conditions. In- 
spection of this table and Figure 2 shows that (a) increased dy- 
namometer-induced tension is associated with decreased recogni- 
tion proficiency, (b) drawing is associated with an increase in 
recognition proficiency over zero-induced tension, and (c) tracing 
is associated with a decrease in recognition proficiency from zero- 
induced tension. 

The analysis of variance of recognition scores with 0, 1%, 4, % 
and 34 dynamometer induced tension reveals statistically signifi- 
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TaBLe II—RecoGniT1on Scores ror ALL TRAINING 
ConpitTions (N In Each CONDITION = 28) 











Training Condition* 
Measuret 
0 % y% K% % D T 
Mean 6.14 | 5.28 | 5.00 | 4.75 | 4.75 | 7.46 | 4.78 
Standard deviation 1.95 | 2.30 | 1.58 | 1.56 | 1.36 | 1.31 | 2.15 


























*0, 4, 4%, %, 3% maximum dynamometer pressure; drawing (D); trac- 
ing (T). 

t Recognition scores are based on number of correct responses to 10 
test items. 


N *28 FOR EACH POINT 5 








MEAN CORRECT RESPONSES 
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FiacureE 2. The effect of induced tension during training on visual form 
recognition. 


cant differences. The F obtained for tension conditions is 3.97 for 
four and one hundred and eight df (p < 0.01), and the F for 
individuals is 3.31 for twenty-seven and one hundred and eight 
df (p < 0.001). 

Also statistically significant was the analysis of variance for 
differences between drawing, tracing, and zero-induced tension 
with an F for conditions of 24.50 for two and fifty-four df (p < 
0.001), and an F for individual differences of 2.97 for twenty-seven 
and fifty-four df (p < 0.001). Bartlett’s chi-square test indicated 
these analyses to be appropriate. 


DISCUSSION 


The two generalizations considered in this study are not sup- 
ported by the results. First, increases in tension level not only 
failed to facilitate learning but produced marked inhibition. 
Secondly, the form-related pattern of the tension-inducing activity 
does not appear to be a major determinant of visual form learning 
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since tracing results in recognition decrement while drawing alone 
results in improvement in recognition over the zero-induced 
tension condition. 

Evidence from this study suggests that the effect of tension on 
learning may be determined by the difficulty and coérdination 
requirements of the tension-inducing activity. On the basis of 
Bourne’s findings for verbal learning (2), no change in form recog- 
nition accuracy was expected as a function of dynamometer 
induced tension. However, the occurrence of decrement in this 
study cannot be interpreted as evidence that the effect of such 
tension is different for visual form learning than for verbal learn- 
ing since Bourne’s experimental procedure differed from this 
study in an important detail. Bourne required only two-second 
intervals of pressure at a }44-maximum dynamometer tension level 
as compared to the fifteen seconds of pressure from )¥ to 34 levels 
in this study. These differences in procedure and results suggest 
that the effect of dynamometer-induced tension is a function of 
_ the difficulty of the procedure: with limited activity, as in Bourne’s 

study, the effect is null; with extensive involvement, as in this 
study, the effect becomes interfering and hence inhibitory. This 
speculation is further supported by the occurrence in this study 
of increased response variation, indicated by dynamometer buzzer 
signals, which appears to be related to amount of elapsed time 
within an interval and to the degree of required pressure. Similar- 
ly the decrement with tracing may be related to the difficulty 
of coérdination introduced both by the precise manual control 
necessary to trace the small image, and by the fact that slight 
head movements result in visual misalignment of image and trac- 
ing. In contrast, codrdination in drawing is relatively gross and 
nonprecise. 

The facilitation with drawing over the zero-induced tension 
condition cannot logically be referred to the form-related quality 
of the tension pattern in view of the decrement with tracing. 
Instead, a striking feature unique to the drawing condition is 
suggested as a possible determinant of the facilitation: S neces- 
sarily makes successive, discrete visual responses as he glances 
from the training form to his copy. It may be that this activity 
promotes an optimal distribution and duration of visual responses 
as compared to the zero-induced tension condition. This specula- 
tion awaits experimental clarification. 
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SUMMARY 


This experiment explores the effect of induced tension during 
learning on visual form recognition. Experimental conditions differ 
in terms of kind and amount of tension. In each training condition 
Ss learn forms which they later recognize from among similar ones. 

Recognition proficiency is at a maximum with drawing, at a 
minimum with tracing, and decreases systematically with increases 
in dynamometer-induced tension. Evidence is presented that 
suggests the inhibition phenomena are related to the coérdination 
requirements and difficulty of the tension-inducing activities. The 
facilitation phenomenon is tentatively referred to the nature of 
the visual responses accompanying drawing. 
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CORRELATION ANALYSIS OF WISC 
AND ACHIEVEMENT TESTS 


JAMES B. STROUD and PAUL BLOMMERS 


State University of Iowa 
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State of lowa Department of Public Insruction 


The Wechsler Intelligence Scale for Children (WISC) has within 
its short history established itself as a popular test. It is now being 
used extensively in school psychological work. Yet insofar as the 
authors can ascertain there has been no large-scale investigation 
of the performance of this test in school situations. 

The primary purpose of this investigation was to determine the 
effectiveness with which all or various combinations of the WISC 
subtests could be used to predict performance on the Reading 
Comprehension, Arithmetic, and Spelling tests of the Iowa Tests 
of Basic Skills (ITBS) battery. The sample consisted of seven 
hundred and seventy-five pupils in Grades three to six. Stanford- 
Binet (S-B) I.Q.’s were also available for six hundred and twenty- 
one of these pupils, making possible comparisons between WISC 
and S-B as predictors of performance on the three achievement 
tests for a large sample of elementary school pupils. 

The pupils in this sample were drawn from a twenty-county area 
in Iowa. The numbers of pupils from cities, towns, consolidated and 
rural schools were roughly proportional to the distribution of pupils 
in these categories. The WISC tests were administered by Lauber 
in connection with her work as a regional school psychologist for 
the twenty-county area. The ITBS battery was administered as a 
part of the University of Iowa’s state-wide testing program. Some 
of the S-B tests were administered by Lauber, but a majority were 
administered by local school psychologists and supervisors. The 
tests were administered over a three-year period, beginning with 
the school year 1952-53. The ITBS battery was administered in 
all participating schools in January of each year. The WISC tests 
were administered to the pupilsin this sample within three months 
(before or after) of the date on which they took the ITBS. This 
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time relationship was not observed in the case of the S-B. Such 
S-B scores as were available were utilized, regardless of the time 
at which the test was administered.? 

The pupils in this sample were all “‘referred’”’ by teachers and/or 
school officials for psychological interviews and examinations. All 
of them were in, or were thought to be in, some kind of school diffi- 
culty. The mean Full Scale WISC I.Q. of the girls in this sample 
was 87.3; that of the boys, 89.7; and that of both sexes, 89.0. The 
within-grades 8.D. of the I.Q.’s was 14.0 for girls and 14.8 for boys, 
and the within-grades within-sexes §.D. was 14.51. The mean I.Q. 
of the sample is considerably below average, although the 8.D. 
approaches the value for the general population. The pupils in the 
sample tend to be somewhat overage for their respective grades. 
The mean chronological ages by grades at the time WISC was 
administered were as follows: Grade three, CA 9:1; Grade four, 
CA 10:2; Grade five, CA 11:7; Grade six, CA 12:7. Also the pupils 
in the sample were, on the average, at least a year retarded in the 


three achievement areas studied. Perhaps it is reasonable to suppose - 


that this sample is representative of the population to which we 
would most commonly wish to generalize in school practice. Pupils 
presenting some kind of school problem make up the bulk of those 
receiving individual psychological examinations. 

As has just been indicated, all subjects studied were referrals. 
The problem categories and frequency of classification, as taken 
from the regional school psychologist’s files, were as follows for the 
two sexes combined: Mental Retardation, sixty-three; Educational 
Retardation (not mentally retarded, but requiring remedial instruc- 
tion, especially in reading, and teacher’s assistance in structuring 
learning activities), two hundred and sixty-eight; Personality Devi- 
ations, apparently of long standing, twenty-five; Transient Per- 
sonality Deviations (although pupil may react rather violently to 
frustrations, he responds to improved environment), one hundred 
and sixty-one; Emotional Problems, evincing serious disturbance, 
one hundred and eighty-eight; miscellaneous, some showing at least 
superficial symptoms of brain damage, seventy. 





1 For a random subsample of one hundred and fifty-six the mean time 
separation for the two administrations was two years and 10.4 months. In 
every case in this subsample S-B was administered first. 
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PROCEDURE 


(1) Within-grades zero order correlations were computed, by 
sex and within-sexes, between WISC Verbal, Nonverbal, and Full 
Scale I.Q.’s and each of the achievement tests, and between S8-B 
I.Q.’s and the achievement tests. 

(2) Within-grades intercorrelations, by sex and within-sexes, were 
computed involving S-B I.Q.’s and WISC Verbal I.Q.’s, Non- 
verbal I.Q.’s, and Full Scale I.Q.’s. 

(3) Vithin-grades zero order intercorrelations were computed 
among WISC subtest raw scores and the achievement tests. 

(4) Beta weights for a within-grades regression analysis involving 
Reading, Arithmetic and Spelling as dependent variables and WISC 
subtest raw scores as independent variables were computed, and 
tested for significance.? The nonsignificant weights were eliminated 
in turn, beginning with the weight for which the t ratio was least, 
and regression equations determined. Multiple correlations between 
the achievement tests and various combinations of the WISC sub- 
tests were computed. 

(5) In order to test the predictions of the various regression 
equations on a sample other than that from which they were de- 
rived, a cross-validation study was carried out on an independent 
sample of one hundred and ninety-nine pupils drawn in the same 
way as the original sample. 


RESULTS 


(1) The within-grades correlations, by sex and within-sexes, be- 
tween WISC Verbal I.Q.’s, Nonverbal I.Q.’s, Full Scale I.Q.’s, 
S-B I.Q.’s and each of the three achievement tests are presented in 
Table I. 

It is noteworthy that the r between WISC Nonverbal I.Q.’s and 
reading was somewhat higher than that between Verbal I.Q.’s and 
reading. The difference is statistically significant. With arith- 
metic and spelling the Verbal I.Q.’s yielded higher r’s than the 
Nonverbal I.Q.’s. The difference is significant only in the case of 
arithmetic. WISC Full Scale I.Q.’s correlated with each of the 
achievement test scores to a higher extent than did S-B and ITBS. 
Although the differences are significant, they should be interpreted 





2 A five per cent significance level was used throughout this investigation. 
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TaBLE I—WITHIN GRADES CORRELATIONS BETWEEN ITBS 
SCORES AND VARIOUS MEASURES OF INTELLIGENCE 



































Females: N = 205] Males: N = 416 | Within Sexes: 

Test Variables 3 es s 3 . 3 is 
ele ele /F/F8 1212/8 

W: V LQ. maps nip ign il nigh mia me ny .67/0.62 
W: Nv 1.Q. 0.67/0.55/0.56,0.62/0.50/0.62 0.63'0.52'0.60 
W:V + Nv IQ. 0.6910 .65,0.60:0.65/0.€6,0.71°|0.66/0. 66/0 .67 
S-B 1.Q. 0.59 0.58/0.51 0.59/0.63/0.66*/0.59)0. .61/0.61 











* Difference between corresponding r’s for sexes statistically significant. 
All other differences between sexes were nonsignificant. 


in the light of the time interval between administrations of S-B and 
ITBS. As noted earlier, the maximum time interval between the 
administration of WISC and ITBS was only three months. The 
approximate mean time interval between administration of S-B 
and WISC was almost three years. This average is also indicative 
of the time interval between administrations of S-B and ITBS. 

(2) Within-grades intercorrelations, by sex and within-sexes, 
between the various I.Q.’s are given in Table II. The magnitude of 
the correlation (r = 0.94) between WISC Full Scale I.Q.’s and 
S-B I.Q.’s was a little greater than one might have anticipated in 
view of the time interval between the administration of the two 
tests. 

(3) Within-grades intercorrelations among WISC subtest raw 
scores and ITBS scores are shown in Table III. It may be seen 


TaBLeE II—WITHIN GRADES CORRELATIONS BETWEEN 
Various MEASURES OF INTELLIGENCE 






































Females: N = 205| Males: N = 416 — — 

Test Variables i ra as a * 
7] Sc |SB/ wc] re | SBI wel pe | SB 

N N FS N 

LQ. 10. LQ./79.| 19. | -@| 19. ro. 1.Q. 
W:V1.Q. 0.73/0.92'0.87/0.64/0.91/0.87/0.67/0.91 0.87 
W: Nv LQ. 0.930.85 0.90.0.83 0.910.83 
W: FS 1.Q. 0.93 0.94 0.94 
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TaBLe IIJ—WitTHiIn Grape INTERCORRELATIONS: WISC anv ITBS 



























































(N = 775) 
Test 1 ee 4 5 6 7 8 10 | 11 |) 12 od 13 

1, W: Information 0.48 0. 49/0. 47/0. 60,0. 13 0.25, 0. eed 3 .31 0. -290. .21)0. p a 

2. W: Comprehension 0.400.500. 53,0. 150.34) 0. 410. 39.0. 36/0. 1 43.0 440. . 
3. W: Arithmetic 0.39,0. 46 0. 200. .28 0. 30 0. 36,0. 34/0. 2 46.0. -70 0. 50 
4. W: Similarities 0.49 0. 150. -330. 310. 350. 29/0. a 43,0. 41 
5. W: Vocabulary 0.080.270. 39,0. 30/0. 110. ; 410. 45,0. 41 
6. W: Digit Span 0.12 0. 15,0. 14 0. 13 C. 13/0. 20 0. 26 0. 23 
7. W: Picture Comp. 0.340.360. -28,0.1 2910. 26.0. 28 
8. W: Picture Arrangement 0.36 0. 440.24 -430. 36 0. 39 
9. W: Block Design 0.500. a. 45 0. 51 
10. W: Object Assembly 0. 26/0. £9 0. 41 0. 51 
11. W: Coding .31/0.29 0.31 
12. ITBS: Reading Comp. 0.73 0.84 
13. ITBS: Arithmetic 0.77 
14. ITBS: Spelling { 












































that Digit Span yielded comparatively low r’s not only with the 
three achievement tests but also with the other subtests. This 
test did, however, in subsequent analyses yield significant beta 
weights*. The r’s between both Picture Completion and Coding 
and the achievement tests were also relatively low. Coding, like 
Digit Span, yielded significant beta weights, but the weights for 
Picture Completion were nonsignificant. The r’s between Block 
Design and reading and between Object Assembly and reading are 
exceptionally high. These two r’s are significantly larger than the 
r’s between reading and all other WISC subtests. The r’s between 
spelling and WISC Block Design, Object Assembly and arithmetic 
were significantly higher than those between spelling and all other 
subtests. The r’s be tween arithmetic and WISC Digit Span, Picture 
Completion, and Coding were significantly lower than those 
between arithmetic and all other WISC subtests except Picture 
Arrangement; as would be expected, the r between arithmetic and 
WISC Arithmetic was significantly higher than any of the r’s 
between this variable and all other WISC subtests. 

In general the intercorrelations among the WISC subtests for 
this sample were lower than those reported by Wechsler. However, 
the relative magnitudes (i.e., rank order) of the coefficients were 
practically the same as obtained by Wechsler. 





* The results of the regression analyses are presented in (4). 
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(4) Multiple regression analyses based on the within-grades 
intercorrelations reported in Table III were performed with ITBS 
Reading, Arithmetic, and Spelling, as dependent variables, and all 
WISC subtest raw scores, as independent variables. Beta weights 
were tested for statistical significance and the non-significant 
variables successively gliminated. The six tests which carried 
significant weights in predicting achievement in reading and spell- 
ing were WISC Arithmetic, Digit Span, Vocabulary, Block Design, 
Object Assembly, and Coding. In the case of arithmetic achieve- 
ment these six tests were joined by WISC Comprehension. Among 
these subtests Comprehension (in the case of the arithmetic cri- 
terion), Coding, and Digit Span were actually of negligible practical 
significance even though statistically significant. The beta-weights 
and multiple correlation coefficients for various combinations of 
WISC tests are shown in Table IV. 

It is of particular interest that for all practical purposes the 
same set of WISC tests (Arithmetic, Vocabulary, Block Design, 
and Object Assembly) are the most effective predictors of each 
achievement test investigated. Moreover, the R’s obtained between 
the least squares composites of these four subtests (raw scores) 
and each of the criterion tests were practically as high as those 
based on all eleven tests (raw scores) and the criterion tests. Also 
the r’s between the unweighted sums of the raw scores of these 


TaBLeE IV—MuLttTIPLeE R’s AND BETA-WEIGHTS FOR WITHIN- 
GRADES REGRESSION ANALYSES (N = 775) 











Reading Arithmetic Spelling 

Multiple R’s 

.74 74° .73° 76 .76° .75° .70 .60° 68° 
Inf. B —0.01 —0.01 0.01 
Comp. 8 0.02 0.06) 0.07 0.01 
Arith. 8 0.11; 0.12 | 0.13 | 0.50) 0.50 | 0.53 | 0.20) 0.20 | 0.23 
Sim. B 0.07 0.06 0.05 
Vocab. 8 0.19; 0.22 | 0.22 | 0.09) 0.11 | 0.14) 0.17} 0.20 | 0.21 
D.S. g 0.06; 0.06 0.10) 0.10 0.10) 0.10 
P. Com. gs |—0.04 —0.04 —0.02 
P. Ar. B 0.04 0.03 0.03 
D.B. g 0.25) 0.25 | 0.26 0.13) 0.12 | 0.14 0.20) 0.21 | 0.22 
O.A. B 0.36 0.38 | 0.40 | 0.10) 0.11 | 0.14 | 0.26) 0.28 | 0.30 
Cod. B 0.06; 0.07 0.06, 0.06 0.08; 0.09 
































* All weights statistically significant. 
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TaBLE V—WITHIN GRADES CORRELATIONS OF VARIOUS TYPES 
BETWEEN ITBS Tesrs AND SELECTED COMBINATIONS 


or WISC Tests 





N = 775 


N = 775 N = 621 






































WISC Tests er |e. | ee 
(Raw Scores) (Raw Scores) | (Scale Scores) 
Reading 
Eleven W: tests 0.742 0.69 0.65 
(6V + 5Nv) 
Verbal 0.52 0.59 
Nonverbal 0.66 0.63 
3, 5, 6, 9, 10, 11f 0.74 0.68 
3,5, 9, 10 0.73 0.69 
Arithmetic 
Eleven W: tests (6V + 0.76 0.63 0.66 
5Nv) 
Verbal 0.61 0.68 
Nonverbal 0.53 0.51 
2, 3, 5, 6, 9, 10, 11 0.76 0.63 
3,5, 9, 10 0.75 0.61 
Spelling 
Eleven W: tests (6V + 0.70 0.65 0.68 
5Nv) 
Verbal 0.55 0.63 0.63 
Nonverbal 0.60 0.60 0.60 
3, 5, 6, 9, 10, 11 0.69 0.65 
3, 5, 9, 10 0.68 0.64 














* Verbal prorated at 5/6. 


t 2 = Comprehension 
3 = Arithmetic 
5 = Vocabulary 
6 = Digit Span 
9 = Block Design 
10 = Object Assembly 
11 = Coding 
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TABLE VI—CoMPARISONS BETWEEN WITHIN-GRADES CrOSS-VALIDATION 
R’s (N = 199) anp R’s For ORIGINAL SAMPLE (N = 775) 

















Criterion WISC Tests a A al et aa 
ITBS: Reading 3, 5, 6, 9, 10, 11 0.69 0.74 
*3,5, 9,10 0.69 0.72 
ITBS: Arithmetic 2, 3, 5, 6, 9, 10, 11 0.68 0.76 
3,5, 9, 10 0.69 0.74 
ITBS: Spelling 3, 5, 6, 9, 10, 11 0.65 0.69 
3,5, 9, 10 0.66 0.68 





four subtests and the criterion tests are practically as high as 
those between the unweighted raw score sums for all eleven tests 
and the criterion tests (see Table V). As a further step, I.Q.’s 
were determined for these four tests from the WISC I.Q. table 
upon the basis of 10/4 times the sum of their scale scores. The 
r between the I.Q.’s thus derived from this abridged battery and 
the I.Q.’s derived from the complete battery (eleven tests) was 
0.85. The r between Verbal I.Q.’s (six tests prorated) and I.Q.’s 
based upon 5/2 times the sum of the scale scores on WISC Arith- 
metic and Vocabulary was 0.83; that between Nonverbal I.Q.’s 
(five tests) and I.Q.’s based upon 5/2 times the sum of the scale 
scores on Block Design and Object Assembly was 0.82. 

The r’s between (1) WISC raw score sums, (2) WISC scale score 
sums, and (3) WISC I.Q.’s and the three achievement tests are 
given in the last three columns of Table V. To facilitate compari- 
son, the R’s shown in Table IV are repeated in Column 1 of this 
table. As was noted above, it may be observed that the r’s between 
the sums of unweighted raw scores of WISC Tests 3, 5, 9, and 10 
and the three achievement tests are relatively high. This suggests 
that the school practitioner might do about as well to administer 
WISC Tests 3, 5, 9, and 10, multiply the scale score sums by 10/4 
and consult the table for conversion of full scale scores into WISC 
I.Q.’s, as to proceed in the conventional way. This, of course, is 
on the assumption that the scores made on these four subtests 
are unaffected by taking the other tests in the battery. The as- 
sumption may not be warranted. To obtain definitive evidence 
one should have to administer these four subtests as a four-test 
battery. 
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(5) Cross-validation. The data in the preceding section tell us 
how the regression equations perform for the sample from which 
they were derived. More important is the way they perform, 
once derived, for an independent sample. To ascertain this, an 
independent sample of one hundred and ninety-nine pupils was 
drawn from the same population as the original sample. Within- 
grades correlations were computed between various combinations 
of WISC tests‘ and the three criterion measures. The results are 
presented in Table VI. For convenience in comparison, the last 
column of Table VI shows the multiple R’s between various com- 
binations of WISC subtest unweighted raw scores and the three 
criterion tests for the original sample. Over all, only a small shrink- 
age resulted. The average difference between the correlations for 
the original sample and the cross-validation sample is about 0.04. 

Comments. In this investigation WISC gave evidence that its 
popularity is well-deserved. The results justify the wide use of 
this test in school situations. Some of the subtests, however, seem 
to contribute very little to the prediction of school achievement 
as measured. It would appear, therefore, that serious attention 
should be given to the possibility of developing an abbreviated 
battery. Our results suggest the use of Arithmetic, Vocabulary, 
Block Design, and Object Assembly. It is recognized that such 
an abbreviated scale would have limited use. There is no intent 
to imply that it should be substituted for the full test in all situ- 
ations. 

The WISC in keeping with current practice in test design pro- 
vides both verbal and nonverbal scores and subtest scores in each 
category, so as to permit differential prediction. Insofar as our 
analyses of data go, there does not appear to be a great deal of 
support for this practice with respect to the WISC tests. The 
total score, as represented by Binet’s concept of a “general average” 
is still far and away the most revealing score, so far as prediction 
of academic achievement is concerned. 





‘ The least squares weights based on the original sample were used. 
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A COMPARISON OF INSTRUCTION BY KINE- 
SCOPE, CORRESPONDENCE STUDY, AND 
CUSTOMARY CLASSROOM 
PROCEDURES 


THOMAS §S. PARSONS 


University of Michigan 


Growing interest in the integration of television into universities, 
on- and off-campus instructional programs has begun to pose many 
questions concerning the efficacy of ‘“‘canned’”’ kinescope presenta- 
tions in relation to more conventional methods. Husband (1), in 
a comparison of live TV, kinescope, and conventional classroom 
presentations of similar course material, has produced results 
indicating superior achievement for the kinescope presentation. 
However, his report does not reveal the statistical significance of 
this superiority, the relative size of his classroom, kinescope, or 
TV-viewing groups, nor statistical evidence of the original com- 
parability of hisexperimental groups. Evans, Roney, and McAdams 
(2), in a comparative study of achievement scores for groups taught 
by conventional lecture, TV-plus-correspondence, and TV-plus- 
classroom discussion methods, found no statistically significant 
differences among any of these groups. 

The present paper compares kinescope-correspondence study 
groups without an instructor, conventional classroom discussion 
experience with an instructor, and independent correspondence 
study for their relative effectiveness as methods of providing 
university-level instruction in identical subject matter (psychology 
of child development). Course effectiveness is measured before, 
during, immediately after, and approximately four months follow- 
ing the course in terms of academic achievement, certain group 
processes, and students’ attitudes toward the experience. 


SUBJECTS 


Forty university upperclassmen served as subjects. Twenty 
students enrolled in the experimenter’s developmental psychology 
course were randomly assigned to one of three groups, each of which 
received instruction according to one of the experimental treat- 
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ments described above. The kinescope group was subdivided into 
two small viewing groups. Twenty students concurrently enrolled 
in the experimenter’s educational sociology course were given no 
consistent instruction in the experimental subject matter and were 


used as controls. 


PROCEDURES 


Common to all three experimental treatments (but not to the 
control group) was the course textbook (3), a syllabus-workbook 
including fourteen written assignments to be completed and 
graded during the course, and several auxiliary manuals covering 
specific topics such as sociometry. Identical subject matter was 
thus presented in the same sequence in each group. The classroom 
group, however, met with the experimenter at regular class sessions 
twice a week, whereas the kinescope groups each met approxi- 
mately once a week in a drawing room to view filmed TV lectures 
on the current course topics. Members of the correspondence 
group worked entirely independently, viewed no filmed presenta- 
tions, and—like the kinescope group—maintained only postal 
contact with the instructor through the fourteen written assign- 
ments and the midsemester examination. These were regularly 
corrected and returned with appropriate comments. The experi- 
menter served as instructor both in the classroom and through 
correspondence—roles in which he had had substantial experience 
prior to this study. 

The control group was compared with the experimental groups 
initially, midway, and at the end of the study for relative achieve- 
ment of the experimental subject matter. The three experimental 
groups were compared laterally for between-groups differences— 
and longitudinally for within-groups differences whenever ap- 
propriate—in seven dependent variables. These, as suggested 
above, included achievement, group cohesiveness, congruence of 
course-associated opinions, the degree of perceived intragroup 
structure or interpersonal dependability, preferences among the 
three experimental conditions of instruction, composite grades for 
the course, and subjects’ evaluations of this course’s personal 
value as compared with others taken concurrently. Data were 
collected prior to active instruction, halfway through the course, 
, and immediately following the course by use of standard multiple- 
choice achievement tests, a modified Libo-Schachter cohesiveness 
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questionnaire, a sociometric test (“With whom would you most 
(least) like to discuss topics covered in the course?’’), and ad hoc 
questionnaires and rating scales. In addition, a “retention” 
achievement test, sociometry, and questionnaires were adminis- 
tered about four months after the course. Analyses incorporated 
chi-square, t, and the analysis of variance as measures of the 
significance of observed differences, and C and #—where applica- 
ble—as crude measures of relationships among the independent 
and dependent variables. 


RESULTS 


Achievement. (See Figure 1.) No significant differences in achieve- 
ment appeared among any groups at the outset, nor among the 
three experimental groups’ scores on the midsemester. At the end 
of the semester, however, the difference in achievement between 
the experimental groups and the control group was significant 
at P = 0.001; whereas there remained no significant differences 
among experimental groups. Similarly, scores based on an evalua- 
tion of all written work done throughout the course showed no 
significant differences among the experimental groups. 

Four months after the course, however, a difference just barely 
significant at P = 0.05 was found which favored the correspondence 
group over the classroom group, although no significant differences 
appeared at this time between the classroom group and the kine- 
scope group or between the kinescope group and the correspondence 
group. A comparative analysis of the net gains in achievement 
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Figure 1. Mean achievement of the three experimental groups on the 
pre-experimental, post-experimental, and Tetention tests. 
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retained by the three experimental groups four months after the 
course (“retention” test scores minus precourse test scores) yielded 
an F ratio which was definitely not statistically significant; and a 
comparison of the three groups for differences in the amount of 
post-course change in tested achievement (post-course test scores 
minus retention test scores) produced an F ratio which was simi- 
larly far from statistical significance. 

Group cohesiveness. Members of the correspondence group were 
never administered the cohesiveness questionnaire since its con- 
tents would have been meaningless to them; their actual cohesive- 
ness as a group was assumed to be nil. Cohesiveness scores were 
obtained from the classroom and kinescope groups, however, and 
means for these two groups were not significantly different at any 
point during the semester. When the classroom group’s mean was 
adjusted by eliminating one highly deviant score on the post-course 
measurement (produced by a psychiatrist’s daughter) it rose 
considerably—above the mean for the kinescope group. Yet, even 
after this adjustment the difference between the two groups 
at this point remained far from significant. 

Congruence of course-associated opinions. (See Figure 2.) On the 
assumptions that friendly interaction among individuals tends to 
cause their opinions about the central topics of interaction to 
converge—and that this convergence of opinions is as likely to 
occur for invalid or wrong opinions as for valid opinions—measure- 
ments of the congruence of course-associated opinions were derived 
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Figure 2. Convergence of course-associated opinions for the three ex- 
perimental groups on the pre-course, post-course, and durability examina- 
tions. 
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for the three experimental groups from detailed item analyses of 
their initial, final, and “‘retention” achievement tests. These meas- 
ures were expressed as a “‘congruence ratio’’: 


Sum of multiple-choice categories in which incorrect 
answers were givens 
Sum of incorrect answers 


2.8 





and the ratios for the three groups were examined for longitudinal 
differences within groups and for differences between groups. 

Although a statistically significant (P < 0.05) decrease in 
congruence was indicated for the correspondence group during 
the course, no significant longitudinal changes were found for the 
other two (presumably high interaction) groups either during or 
after the course, or for the correspondence group after the course. 
Similarly, no significant differences in opinion congruence were 
found among any of the groups at any point when compared 
laterally, even though the small kinescope groups were consistently 
highest and the interactionless correspondence group was con- 
sistently lowest in opinion congruence both during and after the 
course. 

Group structure. (See Figure 3.) On the assumption that group 
members’ willingness to choose others as performing or not per- 
forming functions central to the group’s purpose is a correlate of 
their perceptions of the group’s structure, measures of group struc- 
ture or the dependability of réle expectations were derived from 
composite (positive plus negative) sociometric subject scores for 
the classroom and kinescope groups. These showed a significant 
(P = 0.05) increase during the semester for the classroom treat- 
ment and a decrease which was not quite significant at P = 0.05 
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| Figure 3. Mean group structure scores (sum of “choice” and “‘rejec- 
tion” sociometric subject scores) for the classroom and kinescope groups on 
pre-course, mid-course, and durability measurements. 
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for the kinescope treatment. Similarly, lateral or between-groups 
comparisons of the classroom and kinescope groups on this variable 
indicated no significant differences at the beginning or middle of 
the semester, but a difference significant at beyond P = 0.01 by the 
semester’s end. The delayed comparison of these two groups for 
durability of effect—or retention—showed that the kinescope 
group’s sociometric choices had increased and the classroom 
group’s had decreased, thus bringing them closer together than at 
the end of the course in their potentials for interpersonal depend- 
ability. Nevertheless, even at this point the difference between the 
two groups in this variable remained significant at far beyond 
P = 0.05. 

A source of error in this comparison of group structure was 
introduced by the fact that for the kinescope group n = 7, while 
for the classroom group n = 6, thus consistently giving the kine- 
scope group a greater chance to make sociometric choices than the 
classroom group. However, any adjustment for this disparity 
would tend to lower the kinescope group’s or raise the classroom 
group’s indicated structure, thus uniformly increasing the between- 
groups differences described above. 

Methods preferences. (See Figures 4a, b, and c.) Students’ prefer- 
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Figure 4. Methods preferences for the three experimental groups on 
the pre-course, post-course, and durability measurements. a (upper), 
choices for classroom (accustomed experience); b (middle), choices for ki- 
nescope (partly unaccustomed experience) ; c (lower), choices for correspon- 
dence (totally unaccustomed experience). : 
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ences among the three experimental instructional treatments—as 
measured both prior to and immediately following the cou:se—bore 
a stable, highly significant positive relationship to the treatments 
to which they had actually been arbitrarily assigned. (At both 
timesC = + 0.80, @ = + 0.86; P = 0.001.) Almost all individual 
statements of preference which deviated from this tendency were 
accounted for by members of the kinescope and correspondence 
groups who preferred the classroom experience, although this 
countertendency progressively decreased throughout the course 
for the correspondence group and tended to increase during the 
course for the kinescope group. 

Four months after the course this high relationship remained 
stable and substantially unaltered (C = +0.78, 6 = +0.82; 
P = 0.01), although some reduction in reliability resulted from 
the use of combined measures for the analyses of pre- and post- 
course preferences! while simple measures were used in the analysis 
of retained preferences. Once again, almost all countertendency 
to the obtained relationship was contributed by an equal number 
of students in the correspondence and kinescope groups who stated 
preferences for the classroom treatment, thus stabilizing the per- 
formance of these two groups on the immediate post-course 
measurement and ending their respective trends toward and away 
from their own experimental treatments. 

Course evaluations. The students’ immediate post-experimental 
evaluations of this course in comparison to others taken concur- 
rently revealed that on a three-point scale from “more valuable”’ 
(1) to “less valuable” (3) the mean ratings were 1.0 for the class- 
room group, 1.7 for the kinescope group, and 1.4 for the correspond- 
ence group. 

Analysis of the relationship between these evaluative ratings 
and the subjects’ experimental treatments—arranged in order of 





1On the assumption that the mid-course measurements would serve as 
the best estimate of the mid-point of change during the course, the pre- and 
post-course analyses of methods preferences were based on n* of 
thirty rather than of twenty. These were obtained by using pre-course meas- 
ures plus one-half of the mid-course measures for the pre-course analysis, 
and by using post-course measures plus one-half of the mid-course measures 
for the post-course analysis. Additional tests were run with uncombined 
data to determine the amount of spurious correlation introduced by this 
multip!+ use of individuals’ preferences, and no substantial amount was 
found. ‘ 
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most (classroom) to least (correspondence) similarity to their 
accustomed educational backgrounds—yielded a small positive 
relationship (C = +0.60, 6 = +0.56) which was not significant 
at P = 0.05. Four months later these ratings had not significantly 
altered (classroom group M = 1.2; kinescope M = 1.7; correspond- 
ence M = 1.7), nor had the relationship between them and the 
students’ instructions] treatments changed remarkably (C = 
+0.44,@ = +0.38; i > 0.50). However, a tendency appeared for 
students in all but the kinescope group (which was rated lowest 
immediately after the course) to lower their ratings slightly in the 
direction of “‘about average value.”’ 


CONCLUSIONS 


Throughout the following discussion it should be borne in mind 
that the findings presented above were based upon experimental 
manipulations which involved never more than forty subjects. 
Thus, even though subjects were randomly distributed among 
experimental groups and experimentally produced differences 
were usually tested by the potent F ratio, any interpretation of 
these findings may not inspire the degree of confidence wnich would 
be possible if a larger N had been subjected to similar procedures. 

Therefore, to the extent that we can consider these findings a 
reliable representation of adult students’ reactions to the three 
experimental treatments it may be concluded that: 

(a) None of the three instructional methods is significantly 
more effective than the others in producing terminal achievement. 
All of the methods, as might be expected, are better in this respect 
than exposing the student to no systematic instruction at all. 
in addition, a literal interpretation of this analysis would indicate 
that the post-instructional retention of pure factual achievement 
tends very slightly to favor the correspondence method—and 
perhaps the kinescope method—over the classroom treatment. 
(See Figure 1.) Thus, if one is willing to consider pure tested achieve- 
ment an adequate index of the desired outcomes of instruction, it 
tentatively appears that kinescope (or television) instruction is as 
effective as class discussion methods for the teaching of abstract 
or highly verbal subject matter. Furthermore—and this may be of 
interest to those contemplating the expense of educational TV 
transmission—it seems that independent correspondence study is 
at least as effective in this respect as kinescope instruction, and 
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probably more so than the classroom. Some may feel that this 
finding suggests an expedient solution to the problem of over- 
crowded classrooms and understaffed teaching faculties. 

The reason for this superiority of the correspondence method 
may not be hard to find. Certainly the correspondence group in 
particular had no opportunity to be distracted from the factual 
content of the text (on which the examination was largely based) 
by the “interesting implications,’ “personal applications,” and 
group interaction phenomena with which the class members were 
bombarded. On the other hand, a close examination of the data 
indicates that this significant difference in retained achievement 
may be due in part to a progressive spreading of group achievement 
means which were already separated (though not quite signifi- 
cantly) at the outset. The fact that there were almost no differences 
among the groups’ net gains in achievement during the course and 
among their post-course losses in achievement would tend to 
support this latter interpretation. 

Finally, one might argue that tested achievement, alone, is a 
meager measure of the total desired outcomes of instruction. Such 
an argument would lead to a comparative examination of personal 
satisfaction, of the amount of opinion change, or of certain group 
interaction variables before a final conclusion could be drawn 
concerning relative instructional effectiveness. 

(b) The amount of group structure, cohesiveness, acceptance 
of others’ opinions, and certain other presumed outcomes of group 
interaction (see Figures 2 and 3) show no statistically reliable 
differences between the classroom and kinescope treatments during 
the course except for what appears to be a very reliable (and 
conservative) indication that the classroom-with-instructor method 
tends to produce group structure or dependable réle expectations 
among members at a faster rate and to a higher degree than does 
the instructorless kinescope method (Figure 3). And this differ- 
ence is quite durable; for even though a convergent or regressive 
effect is produced by the lapse of time following active instruc- 
tion, the classroom group still retains significantly more réle 
interdependability than the kinescope groups after approximately 
four months of separation. 

It could be argued that the classroom group’s perceived struc- 
ture is narrowly related to the course’s formal content, and that 
if the kinescope group met as many hours as the classroom group, 
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or if this variable were measured by a sociometric criterion broader 
than “With whom would you most (least) like to discuss topics 
covered in the course?” the obtained superiority of the classroom 
group might be nullified or reversed. Such a finding, in fact, would 
be much easier to reconcile theoretically with the kinescope group’s 
slight superiority in both cohesiveness and opinion convergence. 
It would rectify the appearance given here of a negative relation- 
ship between these two variables and the structuredness of group 
members’ réle expectations. Nevertheless, even though the class- 
room instructor may deliberately establish channels for inter- 
communication on matters predominantly concerning the course’s 
formal content—whereas members of the instructorless group 
might spend more of their time in attempting to establish a more 
universally applicable set of interpersonal réle expectancies—it 
would seem that the academic purposes of any instructional group 


which perceives itself as such must remain the hub of both intra- - 


group communication and of the réle expectancies which become 
central to the group’s processes. In addition, the theoretical objec- 
tion raised above is considerably weakened when one recalls, 
first, that the omission of a single highly deviant cohesiveness 
score raises the classroom group substantially above the kinescope 
group on this variable, and second, that between-groups differences 
in both cohesiveness and opinion convergence were not statistically 
significant either before, during, or after the course. To the objec- 
tion that the meeting hours of the kinescope group were considera- 
bly fewer than those of the classroom group it may be answered 
that this disparity of ‘‘contact hours” has been a common char- 
acteristic of the kinescoped or televised reproduction of a college 
course, and that this comparison is still, therefore, a valid one. 
Thus the classroom treatment’s superiority as a method for produc- 
ing durable interpersonal réle expectancies or perceived structure 
appears to stand up under both logical and statistical analyses. 

A final note concerning the rough validity of these group inter- 
action measures is that they appear to be supported by the inter- 
actionless correspondence group’s significant decrease in opinion 
convergence throughout the course, and its position as the lowest 
of the three groups in this variable by the end of active instruction 
(Figure 2). Following the course, when contact was also lost among 
members of the formerly “high interaction” classroom and kine- 
scope groups, all three groups’ measured opinion convergence 
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rose—probably as a result of practice effect.2 However, the cor- 
respondence group remained consistently below the others in this 
variable even though its post-course rate of gain was greater. 

(c) Students’ preferences among instructional methods appear 
to be mightily conditioned, first, by what they are immediately 
involved in, and secondly, by what they have become accustomed 
to in the past (Figures 4a, b, and c). In all cases the factor of 
immediate involvement is the most powerful determinant of 
methods preferences. Thus students in all three experimental 
groups stated a preference for the treatment to which they had 
been arbitrarily assigned far more than for any other treatment or 
treatments. However, where the immediate experience differs in 
relatively small degree from the accustomed instructional pattern 
(as in the kinescope group) the initial appeal of novelty soon 
diminishes for some students and the factor of custom ultimately 
becomes more important in determining preferences. Conversely, 
to the extent that the immediate experience is more essentially 
different from customary patterns (as in the correspondence 
treatment) this experience appears to become incomparable with 
these patterns—to be evaluated according to new standards; and 
the factor of immediacy or present involvement becomes relatively 
more important. Where the factors of immediacy and custom are 
working in the same direction (as in the classroom treatment) 
there emerges an apparently unanimous preference for the im- 
mediate experience as opposed to any alternatives. 

The stability of this relationship appears to be great. For once 
these two factors—custom and immediacy—have interacted and 
established a balance, this balance appears to resist change long 
after the experiences which produced it have ceased to exist. It 
may be that the relative appeal of these two factors is determined 





*The same sixty-item, four-alternative, multiple choice achievement 
test was used in the pre-course, post-course, and durability measurements. 
This test as a whole probably became increasingly familiar to all 
three groups; and in addition, as knowledge of course content grew, more 
and more alternatives per item were probably immediately ruled out as pos- 
sible correct answers. Thus, during the four months’ retention period—when 
all groups again became equal in the interaction variables—the greatest gain 
in opinion convergence was made by the correspondence group, the next 
greatest by the kinescope group, and the least by the classroom group. ‘This 
is exactly the order in which the three groups fell in tested achievement 
both at the beginning and end of the course and after the retention period. 
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by two personality syndromes which remain stable and separate 
within the student population. Here, again, may be a point of 
interest to those who debate the relative merits of TV or cor- 
respondence study as substitutes for overcrowded and under- 
staffed classrooms. 

Students’ end-of-course evaluations of its personal value to them 
also appeared to be conditioned (though less powerfully and far 
less reliably) by the perceived comparability of their groups’ 
instructional treatments to customary patterns of instruction. 
Thus, in their immediate post-experimental ratings the “tradi- 
tional” classroom group unanimously rated the course more 
valuable than others taken concurrently, while the ‘“‘novel” cor- 
respondence group’s mean rating was about halfway between 
greater and average value, and the “intermediate” kinescope 
group’s mean rating was little better than “of average value.” 
The effect of time passage following an instructional experience 
appears to be a leveling, or a regression toward mediocrity in the 
student’s memory. This is indicated by the durability measures 
for both the classroom and correspondence groups, although the 
previously low kinescope group made no post-course change at all. 

A final inference drawn from these findings might suggest 
that—contrary to what seems to be popular practice—the greatest 
chances for the success of educational TV (or any new instructional 
medium) may lie in the unabashed exploitation of the medium’s 
intrinsic advantages—precisely, the extension of detailed visi- 
bility—rather than in the cautious aping of tried and proven class- 
room techniques. 


SUMMARY 


Kinescope-correspondence instruction apart from an instructor, 
customary classroom experience, and independent correspondence 
study were compared for their relative effectiveness as methods of 
providing university-level instruction in identical subject matter. 
Subjects were randomly assigned among the three experimental 
treatments and a control group was used. All groups were compared 
in achievement, and comparisons were made among the three 
experimental groups, only, on congruence of course-related opin- 
ions, preferences among the methods, and subjects’ ratings of the 
course’s value to them. Comparisons were made between the kine- 
scope group and classroom group, only, on cohesiveness and group 
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structure. Measurements of these variables were made prior to 
instruction, midway in the course, and after a four months’ time 
lapse. 

Throughout the course it was found that although no significant 
differences appeared among the experimental groups’ achievement, 
cohesiveness, and ratings of the course’s personal value, all three 
experimental groups achieved very significantly more and faster 
than the control group. In addition, comparisons of the amount of 
congruence of content-related opinions indicated a significant 
decrease in congruence for the correspondence group, but no signifi- 
cant changes for the other two treatments. Measures of the degree 
of group structure showed a significant increase during the semester 
for the classroom treatment and a not quite significant decrease 
for the kinescope group; thus, comparisons of these two groups at 
the end of the course revealed a highly significant difference be- 
tween them. The subjects’ preferences among experimental treat- 
ments bore a highly significant positive relationship to the treat- 
ments to which they had been assigned; however, almost all 
preferences which deviated from this tendency were accounted 
for by the instructorless groups who preferred the classroom 
experience—the correspondence students showing greater prefer- 
ence for their own group, and the kinescope students tending 
to show greater preference for the classroom. Subjects’ ratings of 
the course’s personal value were highest for the classroom group 
and lowest for the kinescope group; and they bore a nonsignificant 
positive relationship to the experimental treatments arranged 
in the order of most (classroom) to least (correspondence) simi- 
larity to customary educational methods. 

Four months after the course the correspondence group had 
retained significantly more than the classroom group in tested 
achievement, and the classroom group remained significantly 
higher than the kinescope group in group structure. In addition, 
the same highly significant positive relationship was found between 
subjects’ preferences among experimental treatments and the 
treatments to which they had actually been assigned that had 
appeared throughout the course—the deviates from this trend 
being distributed between own group and classroom group in the 
same manner as at the end of the course. 

Within the limits imposed upon generalization by the small 
sample used in this study, these findings are taken to suggest 
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that kinescope or TV techniques are at least as effective as—and 
independent correspondence study is probably more effective 
than—conventional class discussion methods for promoting durable 
factual achievement, alone, in abstract or highly verbal academic 
subjects. All of the methods, of course, are more effective in this 
regard than no systematic instruction at all. In addition, the early 
social mediation of an instructor tends to produce greater famil- 
larity among students in the classroom than in the instructorless 
kinescope treatment; and students’ preferences for their own 
instructional treatments tend to start low and increase through 
time under the unfamiliar correspondence method, whereas they 
tend to decrease with time for the partly familiar kinescope method 
and remain uniformly high under the customary classroom method. 
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THE MTAI IN AN AMERICAN MINORITY - 
GROUP SCHOOL SETTING’ 


I. Differences Between Test Characteristics for 
Norm and Non-norm Populations 


JOSHUA A. FISHMAN 


College Entrance Examination Board and .‘he City College, College 
of the City of New York 


Since the publication of the Minnesota Teacher Attitude Inven- 
tory (MTAI) (3) a number of studies have appeared touching 
upon ‘he reliability, validity, and factorial composition of the 
scores obtained upon this instrument (/, 2, 4, 5, 11, 14). Future 
users of this inventory have been counseled to regard it as still 
in an experimental stage due to the fact that many questions con- 
cerning the criteria which the test predicts, the interpretation of 
its scores, and the stability of its norms remain unanswered. Each 
large-scale use of this instrument must, therefore, be treated as a 
research undertaking which attempts to secure partial answers to 
the above questions in a limited setting. By sharing their insights 
into the functioning of this instrument, investigators will finally 
be able to evaluate it properly. The present study is an attempt to 
investigate certain MTAI test characteristics in a sample drawn 
from the teacher population of various idealogical-structural types 
of American Jewish schools in Greater New York City. The score- 
related characteristics here under study will be (a) type of subject 
taught, (b) age, (c) sex, (d) amount of post-high school education, 
and (e) foreign vs. American birth. 

(a) In her original instrument Leeds found teachers of “‘special 
subjects” (e.g., art, music, etc.) to score lower than the regular 
classroom teachers (10, 12). This finding was also obtained with 
the normifying population (3). 

(b) Leeds also found a significant negative relationship between 





1 The writer is indebted to Dr. Irving Lorge and Dr. Isidore Chein for 
their criticisms and suggestions in connection with the original study from 
which this report is an excerpt, although he alone assumes full responsibility 
for the data and interpretations here presented. 
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inventory scores and age (10, 12). However, her collaborators in 
the normifying study ruled the age factor out statistically by 
deleting all items which originally showed sharp age differences (3). 

(c) The MTAI norms for elementary school teachers are based 
entirely on females, due both to the small percentage of males to be 
found generally in American public elementary education, as well 
as to the particular unavailability of males in the grades during 
wartime. Nevertheless, Leeds mentions ‘tendencies for male 
teachers to score lower than females” (72). 

(d) The MTAI Manual reports that in the normifying popula- 
tion of over seventeen hundred cases the “amount of post-high- 
school education was significantly and positively related to teacher 
attitudes in graded elementary and high schools” (3). 

(e) Finally, in arriving at the norms reported in the MTAI 
Manual, no foreign-born vs. American-born comparisons were 
made. Leeds, however, comments that she “‘also noted a tendency 
for foreign-born teachers to score somewhat lower than American 
born” (12). 

In this study an attempt will be made to trace the relationship 
between these five test characteristics (as obtained from the 
normifying population of American midwestern public school 
teachers) and the MTAI scores of a population largely different 
from that upon which the norms were constructed. The ideological 
characteristics of four of the five subgroups into which this popula- 
tion has been divided for the purposes of stratified sampling and 
data analysis will not be delineated here as this has been done in 
detail elsewhere (7). The fifth subgroup of teachers, namely that 
labeled “Arts,” is composed of teachers employed in all of the four 
preceding school types. However, rather than being teachers of 
traditional academic subjects these teachers teach singing, dancing, 
dramatics, or arts and crafts. 

The sample upon which the following data were obtained was 
randomly selected as part of a larger, unpublished study (6) which, 
in turn, was part of a three-year survey of a score or more aspects 
of the educational activities of this American minority group. 

Due to the non-normality as well as the lack of homoscedasticity 
of the obtained scores, the data were analyzed by Friedman’s two- 
way nonparametric test of analysis of variance of means by ranks 
(9), rather than by the more searching standard technique. 
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Taste I—Megan MTAI Scores spy AGE AND ScHoot Trpz 























Age Groups 
School Types 20-29 30-39 40-49 50-59 
n ' x tm z n = n z 
1. OAD 13 | 60.1 8 | 48.8 2 6.5 8 20.0* 
2. OWD, CWD 18 | 52.4 6 | 44.8 17 | 21.4 5 |-—10.2 
3. ROD 11 62.4f| 10 | 40.8 7 | 49.8 2 15.0 
4. n-rSWD 1 |(42.0) 2 | 31.5 4 19.0 4 9.5* 
5. Arts 4 | 27.8f| §& 1.2 1 /|(30.0) 1 (6.0) 
Totals 47 | 54.6); 31 36.6 | 31 26.8; 21 9.2 























* Includes the score of one teacher older than fifty-nine years. 
t Includes the score of one teacher younger than twenty years. 


1. MTAI Scores by Age Group 


The data in Table I unambiguously point toward a highly 
negative relationship between age and mean MTAI scores. The 
analysis of variance of means by ranks produces a x; value of 10.68 
for the data as a whole across all of the school types. As x; with 
4df equals 9.5 at the 0.05 level, our analysis reveals that the differ- 
ences between the mean MTATI scores of the various groups are 
significant at this level. 

Cook, Leeds, and Callis, in the MTAI Manual report that in 
tryout versions of this instrument the correlation between age and 
score was —0.30. When they divided their sample into two age 
groups, namely, “below forty” and “forty and above,” they found 
the former to score significantly higher than the latter (at the 0.01 
level). These investigators comment that the observed difference 
is due not only to a constant worsening of teacher attitudes toward 
children as teacher age increases, but that it is probably also due to 
the fact that “many older teachers did not have the advantage of 
the recent emphasis, in teacher training, on educational psychology 
and on the understanding of child behavior.”’ 

As stated before, age was no longer significantly related to scores 
in the normifying population due to the deletion of items which 
showed sharp age differences. That age is still significantly related 
to scores in our current sample may be indicative of greater attitudi- 
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nal differences toward children among the various age groups 
represented within the population of teachers in schools of this 
particular American minority group than was the case for the 
normifying population. The inapplicability of the general norms for 
teachers of this minority group, at least for more than rough within 
group comparative purposes, seems clear, at least within the frame- 
work of the age factor. 

Age itself, however, is usually of sociopsychological interest as 
an index of types or degrees of exposure. We will see below that 
age is not as important a variable in itself as it is a concomitant of 
other dynamic factors imbedded in the matrix of teacher-pupil 
interaction within the schools of this minority group. 


2. MTAI Scores by Sex and American or Foreign Birthplace 


The data in Table II point toward a negative relationship 
between mean MTATIT scores and (a) male sex, (b) foreign birth- 
place. The overall difference between the sexes is somewhat slighter 
than the differences between the birthplace groups. The analysis of 
variance of means by ranks again produces a x; value of 10.68 
which, once more, is significant at the 0.05 level. The differences 


TaBLeE IJ—Megan MTAI Scores sy Sex, 
BIRTHPLACE AND SCHOOL TYPES 









































American-Born Foreign-Born 
School Types Male Female Male Female 
n x n x n x n x 
1 3 16.7 12 69.8 9 17.7 7 42.6 
2 21 45.8 11 48.0 9 —2.1 5 10.0 
3 13 48.2 14 55.0 2 30.0 1 (17.0) 
4 1 (83.0) 1 (42.0) 4 —10.2 5 27.2 
5 4 11.0 6 31.8 1 (6.0) _ _ 
Totals 42 39.9 44 53.8 25 6.6 18 27.8 
Birthplace totals: 
American-born, n = 86; x = 47.0 
Foreign-born, n = 43; x = 15.5 


Sex Totals: i 
Males, n = 67; x = 27.5 
Females, n = 62; x = 46.3 
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between the mean MTAI scores of the various sex-birthplace 
groups are thus significant over the data as a whole. We pause, 
however, to inquire whether the four sex-birthplace combinations, 
taken separately, also differ significantly from each other. Utilizing 
the Link-Wallace method (13), described in section 4 below, we 
find that such 7s the cdse for the mean score differences between 
American-born females and foreign-born males or females, as well 
as between American-born males and foreign-born males. Neither 
of the within-birthplace sex-differences is significant, however. 

In arriving at the norms reported in the MTAI Manual no 
birthplace-group comparisons were made. It is highly probable, 
however, that the vast majority of American midwestern public 
school teachers upon whom this instrument was standardized were 
American-born? 

The significant differences between the birthplace-groups can 
probably be ascribed to differences in philosophy of education 
encountered in teacher training, as well as to differences in the 
psychological atmosphere of the family, school, and total adult- 
child interaction during the teacher’s own childhood and youth. 
In addition, the birthplace category interacts appreciably with the 
age variable, as the foreign-born teachers are, generally, older than 
the American-born teachers. The average age of the former is 
forty-five years, whereas that of the latter is thirty-two years. 

The sex difference in mean MTAI score is an interesting al- 
though not entirely unexpected finding. Many boards of education 
throughout the country do not engage males to teach in the lower 
elementary grades,’ notwithstanding the shortage of teachers. 
This policy is due to a conviction that females are better suited to 
the teaching of young children. In addition to some possible 
difference in the degree to which males and females may under- 
standingly empathize with young children, there are probably also 
some psychological needs of young children which females can 
better meet than males. That any sex differences which may exist 
in teacher attitudes toward young children cannot be very great 
among American public school teachers in general is attested to by 





2 The preponderantly middle-western nature of the normifying popula- 
tion also allows us to deduce the modal nature of other sociopsychological 
characteristics which differentiate between it and the current sample. 

* In New York City, e.g., males are not engaged to teach below the fifth 
grade. , 
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the fact that Cook, Leeds and Callis report that ‘men and women 
graduate students (majoring in education, with an average of 
ten years’ teaching experience) . . . have mean MTAI scores which 
are not significantly different” (3). 

The significant difference between mean MTATI scores of males 
and females in our minority group sample may be explained by 
factors additional to those mentioned above. The males in our 
sample are both older (males: average age = forty; females: 
average age = thirty-three) and to a larger extent foreign-born 
(males: thirty-seven per cent foreign-born; females: twenty-nine 
per cent foreign-born). In addition, traditional Jewish education 
for males has been more demanding and more subject-centered 
than that for females. To some extent current teachers of this 
minority group may be reacting to their pupils and to their réles 
as teachers in the light of their own childhood and adolescent 
experiences with Jewish education and with the teacher-pupil 
réles considered appropriate in those times. 

These sex and birthplace differences in mean score are not 
provided for in the general norms. Some such provision is obviously 
needed for interpreting scores of samples drawn from the universe 


of this minority group’s teachers. 


8. MTAI Scores by Extent of Secular Education 


The data in Table III clearly indicate a positive relationship 
between extent of general secular (as opposed to minority-group 
sponsored) education and mean MTAI scores. The analysis of 
variance of means by ranks produces a x; value of 9.96. Since 
x; with 4df is 9.5 at the 0.05 level, the over-all differences between 
the mean MTAI scores of the various secular educational levels 
are significant at this level. 

As stated before, amount of higher education was found to be 
positively and significantly related to MTAI scores in the normify- 
ing population of seventeen hundred cases. 

In our current sample the effect of increased secular education, 
as might be expected, interacts, to some extent, with the age and 
birthplace findings discussed previously. The older, European-born 
and -trained teachers were also found to have received less formal 
education of a general secular variety than have the younger, 
American-born teachers (6). However, there is also much to be 
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TaBLe IIJ—Megean MTAI Scores spy ExrTent or 
GENERAL SECULAR EDUCATION 

















Extent of Secular Education 
School Types NC* ¢ B M 
n z n z n z D z 

1 3 12.3 9 | 30.7/| 14 43.1 5 73.0 
2 2 —10.0 16 5.8 | 20 49.0 8 58.4T 
3 —_— — 5 | 42.8/ 13 38.1 12 61.3T 
4 3 —17.5 6 | 45.7; — —_— 2 —10.0 
5 2 10.0 4 15.0 1 |(—28.0) 4 35.7 

Totals 10 2.8; 40 | 22.9) 48 42.7; 31 53.4 





























*NC = no college; C = college attended, no degree; B = bachelor’s 
degree; M = master’s degree. 
t Includes the score of one Ph.D. 


said in favor of an appreciable effect for a straight “secular educa- 
tion” factor. Increased secular education most frequently signifies 
increased familiarity with concepts of modern education, educa- 
tional psychology, child development, mental hygiene, etc. On 
the whole, an increased awareness and concern for the emotional 
rights and needs of children may be expected to parallel increased 
exposure to modern views in these subject areas. 

In this respect, then, the MTAT is functioning for this non-norm 
group in much the same manner as it was intended to function for 
samples much more similar to the normifying population. 

Increased secular education may be related to mean MTAI 
scores in yet another way, namely, via increased test sophistication. 
As a fakable instrument (14) the MTAI would be most vulnerable 
to faking by those most experienced with modern objective tests, 
i.e., by those groups with the greatest amount of general secular 
education. To the extent that this relationship between faking of 
test response and higher educational attainment is present in our 
data (as well as in the norm data), to that (unknown) extent are 
our findings suspect. All that can be said in the current connection 
is that respondent-anonymity was permitted to encourage com- 
plete frankness in the filling out of this questionnaire. 


a 





48 The Journal of Educational Psychology 


TasBLeE IV—Mean MTAI Scores or Two Successive SAMPLES 
or TEACHERS IN Five Types or ScHOOLS 























First Sample Second Sample Total 
School Types 
n x S.D. n = S.D. n 4 S.D. 
1 26 | 48.9 | 45.0 5 | 45.8 | 34.5 31 | 44.2 | 43.4 
2 41 | 33.6 | 47.8 5 | 28.4 | 53.5 46 | 33.0 | 48.4 
3 5 | 47.6 | 40.2 5 | 56.6 | 25.7 30 | 49.1 | 63.5 
4 2 | 64.0 | 22.0 9 10.1 | 41.8 11 19.9 | 44.2 
5 8 | 11.0 | 50.6 1 | 21.7 | 20.0 11 13.9 | 44.7 
Totals 102 | 40.2 | 44.2; 27 | 33.1 | 40.3 | 129 | 36.1 | 45.5 























4. MTAI Scores and Subject Matter (“‘Academic’’ vs. 
“Arts’”’) Taught 


From all of the foregoing tables, and especially from Table IV 
which follows, it is obvious that the teachers in group 5, i.e., the 
teachers of art, music, dancing, dramatics, or arts and crafts, 
obtained a lower mean MTATI score than did any other group of 
teachers. This finding is in agreement with the findings reported by 
Leeds (10, 12) and with the information given in the Manual (8). 
At this point, however, we pause to carry the analysis one point 
further and ask whether the mean score of the “arts” teachers is 
significantly different from those of the means obtained by the 
other teachers. 

A somewhat roundabout attack at this problem is necessary, due 
to the regrettably unsatisfactory nature of the reply which analysis 
of variance designs give to questions of the type: ‘Is column (or 
row) X significantly different from column (or row) Y or Z?” (18). 

We will, therefore, first enquire whether the five groups of 
teachers differ significantly over the data as a whole. If the answer 
to this question is in the affirmative, we will then enquire as to 
which particular groups differ in total mean scores from the others. 
In this roundabout manner we can determine from which, if any, 
of the groups of teachers of academic subjects the arts teach- 
ers might differ significantly. 

To answer these questions we will utilize the data of Table IV. 
These data were originally collected in order to determine whether 
additional sampling was necessary before embarking on the termi- 
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nal analyses of the category comparisons reported in sections 1 to 
3 above. 

The data in Table IV indicate a highly consistent rank order 
across the two samples for the mean MTAI scores of the five groups 
of teachers. The analysis of variance of means by ranks produces a 
x; value of 32.8. Since‘x? with 4df is 13.3 at the .01 level, the 
over-all differences between the mean MTAI scores of the various 
types of teachers may be considered highly significant. This being 
the case, we may now proceed to enquire whether the mean score 
of the teachers in group 5 departs significantly from those of the 
other four groups. 

Mosteller and Bush (13, pp. 305-307) present a nonparametric 
method, first proposed by R. F. Link and D. L. Wallace, for tack- 
ling such problems in two-way analyses of variance with one ob- 
servation in each cell. This method employs the range as a measure 
of variation rather than sums of squares. It has here been employed 
after converting means to ranks. The “allowance for column totals’’ 
obtained when employing this method turns out to be so large, 
that we are justified in “recognizing” a difference between the 
total mean score of arts teachers and the total mean of only one 
other teaching group, namely group 3. The obtained size of the 
“allowance’”’ is large, due to the small number of rows and columns 
employed in our analysis. Thus we are forced to conclude that the 
mean score of the “arts” and other “special” teachers, although 
low indeed, is not significantly different than the mean scores of 
most of the various types of teachers of academic subjects.‘ In this 
respect, therefore, we may conclude that although the MTAI 
functions on this non-norm group similarly to its functioning on the 
norm population, the reliability of this phenomenon is not one 
which we may accept with complete confidence. 


SUMMARY AND CONCLUSION 


When the MTAI was administered to teachers from a non-norm 
universe, a number of the characteristics which this instrument 
reveals with American midwestern public school teachers no longer 





‘Only nonsignificant differences resulted from applying to our data the 
well-known Tukey (15) test for significance of gap after performing a com- 
mon analysis of variance with one observation in each cell. 
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obtained, while other instrument characteristics continued to 
obtain. 

Those characteristics of the MTAI which obtain for the norm 
and norm-similar populations, but which did not obtain for this 
sample from a universe of teachers in schools of an American 
minority group were: (a) the constancy of MTAI scores over 
increased age. 

Those characteristics of the MTAI which obtain for the norm 
and norm-similar populations, and which did also obtain for this 
sample from a universe of teachers in schools of an American mi- 
nority group were: (a) the lower scores for teachers of ‘‘special 
subjects”; (b) the higher scores for teachers with greater amounts 
oi post-high school education. 

In addition, it was discovered that mean MTAI scores varied 
significantly with two other independent variables not employed 
in the MTAI normifying studies: (a) sex, males scoring significantly 
lower than females, (b) birthplace, foreign-born teachers scoring 
significantly lower than American-born teachers. 

Although the teachers of all school types obtained relatively 
low MTAI scores when compared with the norm of American 
public school teachers, the highest over-all mean scores were 
obtained for school types in which young, American-born, females 
with advance college degrees predominate on the teaching staffs 
(type 1 and, particularly, type 3). Conversely, those school types 
in which most teachers are old, foreign-born, males with little 
formal general (secular) education obtained the lowest over-all 
mean scores.°® 

Although departing somewhat from its functional and structural 
characteristics as a measure of teacher attitudes toward children 
in the norm-population, the MTAI scores in this non-norm popu- 
lation still varied meaningfully with independent variables affect- 
ing teacher attitudes and classroom performance. 

A final question which remains as we bring these comments to 
a close is: does the MTAI continue to measure degree of teacher 
orientation toward and acceptance of the child’s emotional and 





5 In a further study it was demonstrated that of these four variables, ex- 
tent of secular education was found to be the most significantly related to 
MTAI scores (the three other variables being held constant), followed by 
sex, as the second most significant variable. The contributions of age and, 
particularly of nativity, were negligible (6). 
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developmental needs when it is administered in such a non-norm 
universe? A partial answer to this question will be found in a 
subsequent publication (8). 
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KINDERGARTEN TRAINING AND 
GRADE I READING! 


IRENE FAST? 


University of Michigan 


The influence of kindergarten training on the later development 
of the child has been a research problem since the beginning of 
the century when kindergartens were becoming increasingly wide- 
spread on this continent. The results of investigations have sug- 
gested that children with kindergarten training advance more 
rapidly in reading (4, 5, 7) and arithmetic (1), and are generally 
more advanced throughout the elementary school (2, 6). Despite 
the consistency of these findings, however, the question remains 
open because of the difficulty of conducting systematically con- 
trolled research in this area. 

Specifically, it has been impossible to obtain comparable groups 
of children with and without kindergarten training. Where at- 
tendance at kindergarten is voluntary, as it apparently was in 
all studies, uncontrolled differences in home background may 
exist between children whe attend kindergarten and those whose 
parents neglect, refuse, or are for some reason unable to send them. 
Differences in age of entrance into grade I exist in at least one 
case (2), and may be present in others where no information is 
given. Finally, where children with kindergarten training and 
those without have not attended the same schools, as seems to 
be the case in the MacLatchy (6) and Brueckner (2) studies, 
differences in educational policy and administration may have 
influenced the results to an unknown extent. 

It seems clear that until investigations controlling these factors 
can be made no final conclusions can be drawn. Fortunately the 
present writer was able to make use of an existing situation in 
which age, home background, and school environment of the 
subjects could be controlled. In the three urban schools in which 
the study was conducted children whose fifth birthdays fell in 





1 This investigation was supported by Canadian Federal Health Project 
605-5-147. 

2 The writer gratefully acknowledges the assistance of Dr. R. L. Cutler 
in the preparation of this manuscript for publication. 
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November and December of the current school year (1953) were 
excluded from kindergarten training because accommodation was 
limited. The following year they were admitted to grade I and 
placed in the same classrooms as first grade children who had 
received kindergarten training. Thus it was possible to examine 
differences between ont hundred and thirty-four children with 
kindergarten training and forty-six children who had no opportu- 
nity for such training. No difference between the two groups 
other than age was introduced by the criterion of selection for 
kindergarten training. 

It was decided to focus the study on reading progress in the 
first grade, both because reading is generally considered to be of 
primary importance in grade I, and because standardized tests of 
reading are readily available. It was predicted that initial reading 
scores of children with kindergarten training would be higher than 
scores of children without such training, and that this advantage 
would be maintained throughout the school year. 


METHOD 


Subjects. All grade I children enrolled in the three urban schools 
were given a battery of tests. The test results of one hundred and 
thirty-four children with kindergarten training and forty-six 
children without such training were utilized in the present study. 
The test results of children who were repeating grade I and of 
those absent during one or more of the testing sessions were not 
used. 

Materials and Procedure. Following is a list of tests included 
in the battery administered to the above-described population: 

(1) The Dominion Group Test of Reading Readiness, Short 
Form, Omnibus Type B was administered to all grade I children 
in the three schools four weeks after the beginning of the fall 
term. It was given at this time in order that the children would 
have had some opportunity to become accustomed to the classroom 
situation without having had a great deal of formal reading in- 
struction. 

(2) The Survey Test in Silent Reading (First Experimental 
Edition, now being standardized) was administered at the begin- 
ning of February to discover whether differences between the 
kindergarteners and nonkindergarteners were still evident. The 
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two parts, Word Recognition and Paragraph Reading, were treated 
separately in the comparison of groups. 

(3) The Achievement Tests in Silent Reading; Primary, Grade 
I, type III; Paragraph Reading Test; and The Group Test of 
Learning Capacity, Primary, were administered in May. 


RESULTS 


The initial requirement in presenting the results of this study 
is to demonstrate that no differences in age or home background 
influenced the observed differences in reading scores. 

Seven of the one hundred and thirty-four kindergarteners had 
birthdates in November and December and had received kinder- 
garten training in other schools. There seemed to be no a priori 
reason for excluding these from the sample. Twenty-one of the 
forty-six nonkindergarteners’ birthdates were between January 
and October. These children had not been excluded from kinder- 
garten but had failed to attend, probably because their parents 
had been unable to arrange for transportation and they lived some 
distance from school (maximum five miles). To investigate the 
possibility of differences in reading ability or I.Q. between these 
children and the nonkindergarteners whose birthdates fell in 
November and December t tests were made. No significant dif- 
ference in either reading or I.Q. was found. Therefore the forty-six 
nonkindergarteners are treated as one group throughout the study. 

No difference between kindergarteners and nonkindergarteners 
in home environment was introduced by the criterion of selection 
for kindergarten training. To explore the possibility that differences 
in economic status existed between the groups, the occupations 
of the fathers of the subjects, where they were available, were 
plotted on a six-point scale published in the 1947 census data (3). 
A chi-square test showed no significant difference in economic 
status betweer fathers of kindergarteners and nonkindergarten- 
ers. 

Since only children who were five years of age before November 
1, 1953, were admitted to kindergarten, and most nonkinder- 
garteners reached age five in November or December, 1953, the 
kindergarteners were slightly older than the nonkindergarteners. 
Scores of kindergarteners and nonkindergarteners on all tests 
were plotted against age. Because no relation between scores and 
age within the one year period which included all dates of birth 
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TaBLE I—Tue MEANS ON Four TEstTs AND SIGNIFICANCE OF DIFFERENCES 
BETWEEN KINDERGARTENERS AND NONKINDERGARTENERS 
EqQuaTepD on I.Q. anp M.A. 











Mean Scores Subjects Equated on 
Month of Test 
Administration 

‘ K Non-K LQ. M.A. 

October Reading readiness 8.67 4.03 | 0.0025 | 0.0025 
February Word recognition 13.71 11.83 | 0.028 | 0.013 
February Paragraph reading 5.45 3.96 | 0.018 | 0.005 
May Paragraph reading | 13.30 9.54 | 0.0015 | 0.035 




















was found, it was concluded that age difference within this period 
did not significantly influence reading scores. 

Differences in I.Q. (scores on the Dominion Group Test of 
Learning Capacity, Primary) were controlled by matching kinder- 
garteners and nonkindergarteners. The means of reading scores 
in each interval of five 1.Q. points were computed for each group 
on each reading test. The mean scores of kindergarteners and 
nonkindergarteners were compared using a one-tailed t test for 
matched individuals. The significance of differences is shown in 
Table I. 

To control both age and I.Q., the groups were then equated 
as described above, using five-month mental age intervals. As 
is shown in Table I the groups differed significantly on each test. 
The similarity between results when children were equated on 
I.Q. and M.A. scores corroborates the finding that age differences 
within and between groups did not significantly affect the reading 
scores. 


DISCUSSION 


In general the results of previous studies (4, 5, 7) have been 
confirmed: kindergarteners achieve significantly higher scores in 
reading tests than nonkindergarteners. At least three further 
considerations determine the value of the results. 

The first concern is the generality of the results beyond the 
schools in which the study was conducted. The three schools were 
within the public education system. Because a chief area of vari- 
ability within authorized programs concerns formal reading in 
kindergarten it should be mentioned that, while explicit reading 
readiness programs were followed, reading was not formally taught 
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in the kindergartens of any of the three schools. The use of widely 
distributed, standardized reading series in the first grade suggests 
that reading experiences were broadly similar to those found 
throughout this continent. The quality of instruction was not 
unduly influenced by any single teacher because the children were 
distributed among three teachers in kindergarten and among 
nine in the first grade. These factors suggest that similar results 
would be obtained if the study were repeated elsewhere. 

The other two questions concern the extent to which similar 
differences may be found in other aspects of development and 
the length of time in the academic progress of the children that 
differences in achievement persist. At present these questions 
cannot be answered decisively. If the kindergarten reading readi- 
ness program is most closely related to reading progress in grade I 
we would not expect to find differences between kindergarteners 
and nonkindergarteners in other aspects of development. However, 
the major part of the kindergarten program is devoted to more 
general development in social skills, the ability to work independ- 
ently and in groups, and the acquisition of verbal and manual 
skills. If these are the more important factors differences between 
kindergarteners and nonkindergarteners in subjects other than 
reading might be found in grade II or later. 

It is hoped that these questions may be clarified by an extension 
of the study during the current school year. 


SUMMARY 


Reading tests were administered at the beginning, middle, and 
end of the grade I year to one hundred and thirty-four children 
who had spent a year in kindergarten and forty-six otherwise 
comparable children who had not. Group tests of mental capacity 
were administered at the end of grade I. 

Children with kindergarten training were found to achieve 
significantly higher scores on all reading tests than children without 
such training. The possibility of clarifying the degree to which 
such differences existed in other academic areas and persisted 
in later years was explored. 
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ARTHUR E. Fink, Everett E. WILSON AND MERRILL B. Conover. 
The Field of Social Work, third edition. New York: Henry 
Holt and Company, pp. 630, 1958. 

An already good text has been greatly improved in the third 
edition. Concerning the addition of two authors, the Preface says: 
“One evidence of the acceleration as well as of the range within 
the social work field is reflected in the triple authorship of this 
revision ...a volume that undertakes to deal with as extensive an 
area as social work now demands the thinking and the experience 
of other persons.”’ 

The work is organized about sixteen main heads: the problems 
that people present; the development of social services—the 
European background; social services in America—from the 
almshouse to social security; social services in America—from the 
church to the charity organization society movement; social case- 
work; social services in a family-focused agency; services in a local 
welfare department; welfare services for children; psychiatric 
social work; medical social work; the correctional services; school 
social work; social services for the aged; social group work; com- 
munity organization for social welfare; and the profession of social 
work. The chapters on the problems that people bring to social 
agencies, on social casework, and on social services to the aged 
are new. New illustrative material has been added and the bibliog- 
raphies have been brought up to date. Sources for films are indi- 
cated. 

The historical chapters give a valuable background for the 
understanding of present-day social work. The complexity of 
modern social life is reflected in the greater complexity and difficul- 
ties of social work. The authors bring out the development or 
evolution of social services, relating the various steps to the funda- 
mental sciences upon which they are dependent,—medical, psy- 
chiatric, psychological, psychoanalytic, and the relations of edu- 
cation, law and correctional procedures, etc., to the work as 4 
whole. 

The development of social work as a profession is discussed, 
including a platform and an association, organizations, training, 
research, jobs, salaries, publications, etc. The new profession is not 
something which is to be taken up casually by those who have 
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failed in some other work. It is something which now stands on 
its own feet and is to be respected by those who understand it. 
Its interrelations with many organizations and funds are clearly 
indicated in this volume. 

For this tightly packed and unpadded book the present reviewer 
has little of adverse criticism. But although prevention is men- 
tioned and in several places implied, it seems that more on this 
most important subject might have found place; it does not appear 
in the Index. Mental Hygiene finds some place but not the more 
modern development under the now more common head of Mental 
Health. This most important problem of civilization might have 
had a little more attention. There is, however, considerable under 
Mental Illness. 

Case studies at the ends of chapters make a valuable contri- 
bution to the understanding of the social work problems. There are 
some footnotes, and especially excellent bibliographies at the 
end of chapters. 

The book will be useful not only as a text but also as a reference 
book for many details in the development of social work. 

A. 8. Epwarps 

The University of Georgia 
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